Introducción

En los últimos años se ha mejorado notablemente la calidad de los vinos que se comercializan/Consumen, en especial despues de que se comenzaran a definir los requisitos para las llamadas DOC (denominacion de origen controlado), que abarcan todo el proceso de fabricacion del vino, desde la región donde se produce, sus prácticas de cultivo, cepaje o variedad, cosecha, industrializacion, estacionamiento y distribución, así como las características que debe cumplir el producto final () . En este práctico exploraremos un dataset que registra diversos aspectos generalmente evaluados por catadores profecionales, mediante métodos de clustering para determinar la relacion entre ellos al definir la calidad del producto. Utilizamos el data set Kaggle “Red Wine Quality” (https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009). El mismo contiene una lista de 1600 catas “ciegas” con 10 aspectos que determina la calidad del vino. Este estudio se realizó sobre catas de vinos originarios de una región de Portugal.

Exploración de datos

#if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, skimr, GGally, plotly, viridis, caret, randomForest, e1071, rpart, xgboost, h2o, corrplot, rpart.plot, corrgram, lightgbm)
## Installing package into 'C:/Users/marcelo.cena/Documents/R/win-library/3.5'
## (as 'lib' is unspecified)
## Warning: package 'lightgbm' is not available (for R version 3.5.0)
## Warning: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.5:
##   no fue posible abrir la URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.5/PACKAGES'
## Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help
## Warning in p_install(package, character.only = TRUE, ...):
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'lightgbm'
## Warning in pacman::p_load(tidyverse, skimr, GGally, plotly, viridis, caret, : Failed to install/load:
## lightgbm
vinotinto <- read.csv("./winequality-red.csv",header=TRUE)

``

Vemos los encabezados del archivo

head(vinotinto)
##   fixed_acidity volatile_acidity citric_acid residual_sugar chlorides
## 1           7.4             0.70        0.00            1.9     0.076
## 2           7.8             0.88        0.00            2.6     0.098
## 3           7.8             0.76        0.04            2.3     0.092
## 4          11.2             0.28        0.56            1.9     0.075
## 5           7.4             0.70        0.00            1.9     0.076
## 6           7.4             0.66        0.00            1.8     0.075
##   free_sulfur_dioxide total_sulfur_dioxide density   pH sulphates alcohol
## 1                  11                   34  0.9978 3.51      0.56     9.4
## 2                  25                   67  0.9968 3.20      0.68     9.8
## 3                  15                   54  0.9970 3.26      0.65     9.8
## 4                  17                   60  0.9980 3.16      0.58     9.8
## 5                  11                   34  0.9978 3.51      0.56     9.4
## 6                  13                   40  0.9978 3.51      0.56     9.4
##   quality
## 1       5
## 2       5
## 3       5
## 4       6
## 5       5
## 6       5

Ahora con skim le pegamos una mirada al contenido del archivo

vinotinto %>% skim() %>% kable()
## Skim summary statistics  
##  n obs: 1599    
##  n variables: 12    
## 
## Variable type: integer
## 
## variable   missing   complete   n      mean   sd     p0   p25   p50   p75   p100   hist     
## ---------  --------  ---------  -----  -----  -----  ---  ----  ----  ----  -----  ---------
## quality    0         1599       1599   5.64   0.81   3    5     6     6     8      <U+2581><U+2581><U+2581><U+2587><U+2587><U+2581><U+2582><U+2581> 
## 
## Variable type: numeric
## 
## variable               missing   complete   n      mean    sd       p0      p25    p50     p75    p100   hist     
## ---------------------  --------  ---------  -----  ------  -------  ------  -----  ------  -----  -----  ---------
## alcohol                0         1599       1599   10.42   1.07     8.4     9.5    10.2    11.1   14.9   <U+2582><U+2587><U+2585><U+2583><U+2582><U+2581><U+2581><U+2581> 
## chlorides              0         1599       1599   0.087   0.047    0.012   0.07   0.079   0.09   0.61   <U+2587><U+2583><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581> 
## citric_acid            0         1599       1599   0.27    0.19     0       0.09   0.26    0.42   1      <U+2587><U+2585><U+2585><U+2586><U+2582><U+2581><U+2581><U+2581> 
## density                0         1599       1599   1       0.0019   0.99    1      1       1      1      <U+2581><U+2581><U+2583><U+2587><U+2587><U+2582><U+2581><U+2581> 
## fixed_acidity          0         1599       1599   8.32    1.74     4.6     7.1    7.9     9.2    15.9   <U+2581><U+2587><U+2587><U+2585><U+2582><U+2581><U+2581><U+2581> 
## free_sulfur_dioxide    0         1599       1599   15.87   10.46    1       7      14      21     72     <U+2587><U+2587><U+2585><U+2582><U+2581><U+2581><U+2581><U+2581> 
## pH                     0         1599       1599   3.31    0.15     2.74    3.21   3.31    3.4    4.01   <U+2581><U+2581><U+2585><U+2587><U+2585><U+2581><U+2581><U+2581> 
## residual_sugar         0         1599       1599   2.54    1.41     0.9     1.9    2.2     2.6    15.5   <U+2587><U+2582><U+2581><U+2581><U+2581><U+2581><U+2581><U+2581> 
## sulphates              0         1599       1599   0.66    0.17     0.33    0.55   0.62    0.73   2      <U+2582><U+2587><U+2582><U+2581><U+2581><U+2581><U+2581><U+2581> 
## total_sulfur_dioxide   0         1599       1599   46.47   32.9     6       22     38      62     289    <U+2587><U+2585><U+2582><U+2581><U+2581><U+2581><U+2581><U+2581> 
## volatile_acidity       0         1599       1599   0.53    0.18     0.12    0.39   0.52    0.64   1.58   <U+2582><U+2587><U+2587><U+2583><U+2581><U+2581><U+2581><U+2581>

Veamos las correlaciones que existen entre las variables

vinotinto %>% cor() %>% corrplot.mixed(upper = "ellipse", tl.cex=.8, tl.pos = 'lt', number.cex = .8)

En el gráfico se pueden observar en la parte inferior de la diagonal los valores y en la parte superior con un gráfico de temperatura las relaciones entre las diferentes variables.

Intentaremos ver cuales son las que más influyen en la calidad del vino…podemos asumir, viendo el gráfico anterior que los clorhidridos y los sulfatos no inluyen en la relacion , durante los proximos pasos vamos a comprobar si esto es así o no…

vinotinto %>% 
  mutate(quality = as.factor(quality)) %>% 
  select(-c(sulphates, chlorides)) %>% 
  ggpairs(aes(color = quality, alpha=0.4),
          columns=1:9,
          lower=list(continuous="points"),
          upper=list(continuous="blank"),
          axisLabels="none", switch="both")

Normalizamos usando z-scores y analizamos de nuevo:

vinotinto_n_zscore1 <- vinotinto
 for(j in seq_len(ncol(vinotinto_n_zscore1))) { 
      if (j!="12") vinotinto_n_zscore1[,j] <- scale(vinotinto_n_zscore1[,j]) 
 } 
vinotinto_n_zscore <- as.data.frame(vinotinto_n_zscore1)
head(vinotinto_n_zscore)
##   fixed_acidity volatile_acidity citric_acid residual_sugar   chlorides
## 1    -0.5281944        0.9615758   -1.391037    -0.45307667 -0.24363047
## 2    -0.2984541        1.9668271   -1.391037     0.04340257  0.22380518
## 3    -0.2984541        1.2966596   -1.185699    -0.16937425  0.09632273
## 4     1.6543385       -1.3840105    1.483689    -0.45307667 -0.26487754
## 5    -0.5281944        0.9615758   -1.391037    -0.45307667 -0.24363047
## 6    -0.5281944        0.7381867   -1.391037    -0.52400227 -0.26487754
##   free_sulfur_dioxide total_sulfur_dioxide    density         pH
## 1         -0.46604672           -0.3790141 0.55809987  1.2882399
## 2          0.87236532            0.6241680 0.02825193 -0.7197081
## 3         -0.08364328            0.2289750 0.13422152 -0.3310730
## 4          0.10755844            0.4113718 0.66406945 -0.9787982
## 5         -0.46604672           -0.3790141 0.55809987  1.2882399
## 6         -0.27484500           -0.1966174 0.55809987  1.2882399
##     sulphates    alcohol quality
## 1 -0.57902538 -0.9599458       5
## 2  0.12891007 -0.5845942       5
## 3 -0.04807379 -0.5845942       5
## 4 -0.46103614 -0.5845942       6
## 5 -0.57902538 -0.9599458       5
## 6 -0.57902538 -0.9599458       5

Hagamos un par de visualizaciones para ver que variables nos conviene:

vinotinto %>% 
  plot_ly(x=~alcohol,y=~volatile_acidity,z= ~sulphates, color=~quality, hoverinfo = 'text', colors = viridis(3),
          text = ~paste('Calidad:', quality,
                        '<br>Alcohol:', alcohol,
                        '<br>Acidez volatil:', volatile_acidity,
                        '<br>Sulfatos:', sulphates)) %>% 
  add_markers(opacity = 0.8) %>%
  layout(title = "3D Calidad del vino",
         annotations=list(yref='paper',xref="paper",y=1.05,x=1.1, text="quality",showarrow=F),
         scene = list(xaxis = list(title = 'Alcohol'),
                      yaxis = list(title = 'Acidez volatil'),
                      zaxis = list(title = 'Sulfatos')))
vinotinto %>% 
  plot_ly(x=~alcohol,y=~pH,z= ~citric_acid, color=~quality, hoverinfo = 'text', colors = viridis(3),
          text = ~paste('Calidad:', quality,
                        '<br>Alcohol:', alcohol,
                        '<br>PH:', pH,
                        '<br>Acido Citrico:', citric_acid)) %>% 
  add_markers(opacity = 0.8) %>%
  layout(title = "3D Calidad del Vino",
         annotations=list(yref='paper',xref="paper",y=1.05,x=1.1, text="quality",showarrow=F),
         scene = list(xaxis = list(title = 'Alcohol'),
                      yaxis = list(title = 'PH'),
                      zaxis = list(title = 'Acido Citrico')))
vinotinto %>% 
  plot_ly(x=~total_sulfur_dioxide,y=~fixed_acidity,z= ~residual_sugar, color=~quality, hoverinfo = 'text', colors = viridis(3),
          text = ~paste('Calidad:', quality,
                        '<br>Dioxido de sulfuro total:', total_sulfur_dioxide,
                        '<br>Acidez:', fixed_acidity,
                        '<br>Azucar residual:', residual_sugar)) %>% 
  add_markers(opacity = 0.8) %>%
  layout(title = "3D Calidad del Vino",
         annotations=list(yref='paper',xref="paper",y=1.05,x=1.1, text="quality",showarrow=F),
         scene = list(xaxis = list(title = 'Dioxido de sulfuro total'),
                      yaxis = list(title = 'Acidez'),
                      zaxis = list(title = 'Azucar Residual')))

Utilizaremos los métodos elbow, silhouette y gap_stat para determinar el número optimo de clusters.

- El método Elbow busca medir el resultado de la funcion de costo del método de clustering a medida que aumento el numero de clusters.

- El método silhouette mide cuan similar cada objeto es a su propio cluster y cuan distante de los otros clusters, luego compara el promedio de todas estos valores a medida que aumento el número de clusters.

- El método gap_stat compara para diferentes valores de k, la varianza total intra-cluster observada frente al valor esperado acorde a una distribución uniforme de referencia. La estimación del número óptimo de clusters es el valor k con el que se consigue maximizar el estadístico gap, es decir, encuentra el valor de k con el que se consigue una estructura de clusters lo más alejada posible de una distribución uniforme aleatoria. Este método puede aplicarse a cualquier tipo de clustering.

library(mclust)
## Package 'mclust' version 5.4
## Type 'citation("mclust")' for citing this R package in publications.
## 
## Attaching package: 'mclust'
## The following object is masked from 'package:purrr':
## 
##     map
library(cluster)
library(factoextra)
## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
#Elbow method
set.seed(97)
fviz_nbclust(vinotinto_n_zscore1[,2:8], kmeans, nstart = 30,  method = "wss")

fviz_nbclust(vinotinto_n_zscore1[,2:8], kmeans, nstart = 30,  method = "silhouette")

fviz_nbclust(vinotinto_n_zscore1[,2:8], kmeans, nstart = 30, method = "gap_stat", nboot = 500)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

dados los valores aquí observados, creemos que K entre 2 y 4 es el k que mejor se comporta.

Probaremos los distintos k y Realizaremos 40 procesos de kmeans comenzando desde puntos aleatorios para evitar caer en minimos locales.

set.seed(97)
vinotinto_n_zscore_2 <- vinotinto_n_zscore1
mod_vino_2 <- kmeans(x=vinotinto_n_zscore_2[,1:10], centers=2, iter.max=500, nstart=40)
set.seed(97)
vinotinto_n_zscore_3 <- vinotinto_n_zscore1
mod_vino_3 <- kmeans(x=vinotinto_n_zscore_3[,1:10], centers=3, iter.max=500, nstart=40)
set.seed(97)
vinotinto_n_zscore_4 <- vinotinto_n_zscore1
mod_vino_4 <- kmeans(x=vinotinto_n_zscore_4[,1:10], centers=4, iter.max=500, nstart=40)
set.seed(97)
vinotinto_n_zscore_5 <- vinotinto_n_zscore1
mod_vino_5 <- kmeans(x=vinotinto_n_zscore_5[,1:10], centers=5, iter.max=500, nstart=40)
vinotinto_n_zscore_2["cluster"] <- mod_vino_2$cluster

plot(vinotinto_n_zscore_2[,1:10], col=mod_vino_2$cluster)

vinotinto_n_zscore_3["cluster"] <- mod_vino_3$cluster

plot(vinotinto_n_zscore_3[,1:10], col=mod_vino_3$cluster)

vinotinto_n_zscore_4["cluster"] <- mod_vino_4$cluster

plot(vinotinto_n_zscore_4[,1:10], col=mod_vino_4$cluster)

vinotinto_n_zscore_5["cluster"] <- mod_vino_5$cluster

plot(vinotinto_n_zscore_5[,1:10], col=mod_vino_5$cluster)

*******************+++++++++++++++++++++**************

Realizaremos 40 procesos de kmeans comenzando desde puntos aleatorios para evitar caer en minimos locales.

set.seed(97)
mod_vino <- kmeans(x=vinotinto_n_zscore1[,1:10], centers=4, iter.max=500, nstart=40)

Agregamos los clusters como columna para poder contrastar contra la informacion del dataframe

vinotinto_n_zscore1["cluster"] <- mod_vino$cluster
head(vinotinto_n_zscore1)
##   fixed_acidity volatile_acidity citric_acid residual_sugar   chlorides
## 1    -0.5281944        0.9615758   -1.391037    -0.45307667 -0.24363047
## 2    -0.2984541        1.9668271   -1.391037     0.04340257  0.22380518
## 3    -0.2984541        1.2966596   -1.185699    -0.16937425  0.09632273
## 4     1.6543385       -1.3840105    1.483689    -0.45307667 -0.26487754
## 5    -0.5281944        0.9615758   -1.391037    -0.45307667 -0.24363047
## 6    -0.5281944        0.7381867   -1.391037    -0.52400227 -0.26487754
##   free_sulfur_dioxide total_sulfur_dioxide    density         pH
## 1         -0.46604672           -0.3790141 0.55809987  1.2882399
## 2          0.87236532            0.6241680 0.02825193 -0.7197081
## 3         -0.08364328            0.2289750 0.13422152 -0.3310730
## 4          0.10755844            0.4113718 0.66406945 -0.9787982
## 5         -0.46604672           -0.3790141 0.55809987  1.2882399
## 6         -0.27484500           -0.1966174 0.55809987  1.2882399
##     sulphates    alcohol quality cluster
## 1 -0.57902538 -0.9599458       5       3
## 2  0.12891007 -0.5845942       5       3
## 3 -0.04807379 -0.5845942       5       3
## 4 -0.46103614 -0.5845942       6       2
## 5 -0.57902538 -0.9599458       5       3
## 6 -0.57902538 -0.9599458       5       3
mod_vino$centers
##   fixed_acidity volatile_acidity citric_acid residual_sugar   chlorides
## 1   -0.11148164      -0.06440159   0.1525480     0.32651253 -0.05595642
## 2    1.09854372      -0.71525451   1.0052077     0.07605952 -0.04829747
## 3   -0.65364249       0.50865027  -0.7994727    -0.22963550 -0.17652640
## 4    0.03849838      -0.03622914   1.1448824    -0.39397199  5.60356456
##   free_sulfur_dioxide total_sulfur_dioxide    density         pH
## 1          1.09357065            1.1879001  0.1636856 -0.1008910
## 2         -0.58349920           -0.5669563  0.5077236 -0.7268260
## 3         -0.27491378           -0.3668543 -0.4358186  0.6078955
## 4         -0.05496302            0.5066234  0.1098485 -1.6502733
##     sulphates
## 1 -0.06299844
## 2  0.32689861
## 3 -0.32708864
## 4  3.49946941
plot(vinotinto_n_zscore1[,1:10], col=mod_vino$cluster)

plot(vinotinto_n_zscore1[,1:3], col=mod_vino$cluster)

plot(vinotinto_n_zscore1[,2:4], col=mod_vino$cluster)

plot(vinotinto_n_zscore1[,5:8], col=mod_vino$cluster)

plot(vinotinto_n_zscore1[,7:10], col=mod_vino$cluster)

Veamos ahora si lo hacemos sin normalizar

set.seed(97)
mod_vino <- kmeans(x=vinotinto_n_zscore[,1:10], centers=4, iter.max=500, nstart=40)

Agregamos los clusters como columna para poder contrastar contra la informacion del dataframe

vinotinto_n_zscore["cluster"] <- mod_vino$cluster
head(vinotinto_n_zscore)
##   fixed_acidity volatile_acidity citric_acid residual_sugar   chlorides
## 1    -0.5281944        0.9615758   -1.391037    -0.45307667 -0.24363047
## 2    -0.2984541        1.9668271   -1.391037     0.04340257  0.22380518
## 3    -0.2984541        1.2966596   -1.185699    -0.16937425  0.09632273
## 4     1.6543385       -1.3840105    1.483689    -0.45307667 -0.26487754
## 5    -0.5281944        0.9615758   -1.391037    -0.45307667 -0.24363047
## 6    -0.5281944        0.7381867   -1.391037    -0.52400227 -0.26487754
##   free_sulfur_dioxide total_sulfur_dioxide    density         pH
## 1         -0.46604672           -0.3790141 0.55809987  1.2882399
## 2          0.87236532            0.6241680 0.02825193 -0.7197081
## 3         -0.08364328            0.2289750 0.13422152 -0.3310730
## 4          0.10755844            0.4113718 0.66406945 -0.9787982
## 5         -0.46604672           -0.3790141 0.55809987  1.2882399
## 6         -0.27484500           -0.1966174 0.55809987  1.2882399
##     sulphates    alcohol quality cluster
## 1 -0.57902538 -0.9599458       5       3
## 2  0.12891007 -0.5845942       5       3
## 3 -0.04807379 -0.5845942       5       3
## 4 -0.46103614 -0.5845942       6       2
## 5 -0.57902538 -0.9599458       5       3
## 6 -0.57902538 -0.9599458       5       3
mod_vino$centers
##   fixed_acidity volatile_acidity citric_acid residual_sugar   chlorides
## 1   -0.11148164      -0.06440159   0.1525480     0.32651253 -0.05595642
## 2    1.09854372      -0.71525451   1.0052077     0.07605952 -0.04829747
## 3   -0.65364249       0.50865027  -0.7994727    -0.22963550 -0.17652640
## 4    0.03849838      -0.03622914   1.1448824    -0.39397199  5.60356456
##   free_sulfur_dioxide total_sulfur_dioxide    density         pH
## 1          1.09357065            1.1879001  0.1636856 -0.1008910
## 2         -0.58349920           -0.5669563  0.5077236 -0.7268260
## 3         -0.27491378           -0.3668543 -0.4358186  0.6078955
## 4         -0.05496302            0.5066234  0.1098485 -1.6502733
##     sulphates
## 1 -0.06299844
## 2  0.32689861
## 3 -0.32708864
## 4  3.49946941
plot(vinotinto_n_zscore[,1:10], col=mod_vino$cluster)

plot(vinotinto_n_zscore[,1:3], col=mod_vino$cluster)

plot(vinotinto_n_zscore[,2:4], col=mod_vino$cluster)

plot(vinotinto_n_zscore[,5:7], col=mod_vino$cluster)

plot(vinotinto_n_zscore[,8:10], col=mod_vino$cluster)

Al observar los gráficos, podemos ver que cuando se hace el análisis se hace en los datos sin normalizar, la dispersion es mas grande, así como se aprecia la prevalencia de algunas variables que tienene valores muy grandes, en comparacion a otras.